Title: Case Study 1: Linear Regression
Authors: Will Butler, Robert (Reuven) Derner Date: 8/24/23

Business Understanding¶

We have a problem that has been brought to us from a group of scientists that are looking at superconductors. Superconductors are materials that give little or no resistance to electrical current.

The Scientists are looking at us to use the data provided to produce a model to predict new superconductors based on the properties and the data that they have found so far. Some of the data points include material composition, temperature at which they superconduct. We're going to examine the data set through exploratory data analysis.

The model desired is going to predict new superconductors and the temperature at which they operate based on the experimental inputs from the data that they have provided to us. The model needs to be interpretable so that the scientists' can figure out at what temperature new superconductors would become superconductors, not only if they would be superconductors. We will conduct a regression type of model to give the scientists ease of interpretability based on the relative importance of each feature in the model.

Data Source:

Provided by client with metadata dictionary regarding terms

In [1]:
# Import libraries
import pandas as pd
import seaborn as sns
import numpy as np
from numpy import mean
import matplotlib.pyplot as plt
import random
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio 
from tabulate import tabulate
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.datasets import make_classification
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.metrics import classification_report, confusion_matrix,mean_squared_error

# Workbook settings
pd.set_option('display.max_columns', None)
random.seed(110)
pio.renderers.default='notebook'
In [2]:
# Import data from github (next phase)
url = 'https://raw.githubusercontent.com/ReuvenDerner/MSDS_QuantifyingTheWorld/main/train.csv'
data = pd.read_csv(url, encoding = "utf-8")
In [3]:
# Import data from github (next phase)
url2 = 'https://raw.githubusercontent.com/ReuvenDerner/MSDS_QuantifyingTheWorld/main/unique_m.csv'
unique_m = pd.read_csv(url2, encoding = "utf-8")
In [4]:
# loacal Import (To be removed later)
# data = pd.read_csv("C:/Users/robert.derner/OneDrive - Flagship Credit Acceptance/Documents/School/Quantifying The World/Case Study One/train.csv")

Data Meaning Type¶

Describe the meaning and type of data (scale, values, etc.) for each attribute in the data file.

Response Features Definition
Critical Temp When a superconductor reaches critical temperature and becomes a superconductor
Categorical Features Definition
Number_of_elements The number of periodic elements contained in the superconductor
Continuous Features Definition
mean_atomic_mass The average atomic mass

wtd_mean_atomic_mass | The weighted average atomic mass | gmean_atomic_mass | g-average given the atomic mass | wtd_gmean_atomic_mass | Weighted g-average given the atomic mass | entropy_atomic_mass | The degree of disorder or uncertainy given the atomic mass | wtd_entropy_atomic_mass | Weighted average degree of disorder or uncertainty given the atomic mass | range_atomic_mass | Range of atomic mass | wtd_range_atomic_mass | Weight Range of atomic mass | std_atomic_mass | Standard Deviation of atomic mass | wtd_std_atomic_mass | Weighted standard deviation of atomic mass | mean_fie | Average of Fie | wtd_mean_fie | Weighted average of fie | gmean_fie | G-Average of Fie | wtd_gmean_fie | Weighted g-average of fie | entropy_fie | The degree of disorder or uncertainty of Fie | wtd_entropy_fie | Weighted degree of disorder or uncertainty of Fie | range_fie | Range of FIE | wtd_range_fie | Weighted Range of FIE | std_fie | Standard deviation of FIE | wtd_std_fie | Weighted Standard Deviation of FIE | mean_atomic_radius | The Average of atomic radius | wtd_mean_atomic_radius | The weighted average of atomic radius | gmean_atomic_radius | The g-average of atomic radius | wtd_gmean_atomic_radius | The weighted g-average of atomic radius | entropy_atomic_radius | The degree of disorder or uncertainty of atomic radius | wtd_entropy_atomic_radius | The weighted degree of disorder or uncertainty of atomic radius | range_atomic_radius | The range of atomic radius | wtd_range_atomic_radius | The weighted range of atomic radius | std_atomic_radius | The standard deviation of atomic radius | wtd_std_atomic_radius | The weighted standard deviation of atomic radius | mean_Density | The average Density | wtd_mean_Density | The weighted average Density | gmean_Density | The g-average Density | wtd_gmean_Density | The weghted g-average Density | entropy_Density | The degree of disorder or uncersity in Density | wtd_entropy_Density | The weighted degree of disorder or uncertainty in Density | range_Density | The range of Density | wtd_range_Density | The weighted range of Density | std_Density | The standard deviation of Density | wtd_std_Density | The weighted standard deviation of Density | mean_ElectronAffinity | The average of Electron Affinity | wtd_mean_ElectronAffinity | The weighted average of Electron Affinity | entropy_ElectronAffinity | The degree of disorder or uncersity in Electron Affinity | wtd_entropy_ElectronAffinity | The weighted degree of disorder or uncertainty in Electron Affinity | range_ElectronAffinity | The range of Electron Affinity | wtd_range_ElectronAffinity | The weighted range of Electron Affinity | std_ElectronAffinity | The standard deviation of Electron Affinity | wtd_std_ElectronAffinity | The wegithed standard deviation of Electron Affinity | mean_FusionHeat | The average of Fusion Heat | wtd_mean_FusionHeat | The weighted average of Fusion Heat | gmean_FusionHeat | The g-average of of Fusion Heat | wtd_gmean_FusionHeat | The weighted g-average of Fusion Heat | entropy_FusionHeat | The degree o fdisorder or uncertainty of Fusion Heat | wtd_entropy_FusionHeat | The weighted degree of disorder or uncertainity of Fusion Heat | range_FusionHeat | The range of Fusion Heat | wtd_range_FusionHeat | The weighted range of Fusion Heat | std_FusionHeat | The standard deviation of Fusion Heat | wtd_std_FusionHeat | The wegihted standard deviation of Fusion Heat | mean_ThermalConductivity | The average of Thermal Conductivity | wtd_mean_ThermalConductivity | The weighted average of Thermal Conductivity | gmean_ThermalConductivity | The g-average of Thermal Conductivity | wtd_gmean_ThermalConductivity | The wegihted g-mean of Thermal Conductivity | entropy_ThermalConductivity | The degree of disorder or uncertainty of Thermal Conductivity | wtd_entropy_ThermalConductivity | The weighted degree of disorder or uncertainty of Thermal Conductivity | range_ThermalConductivity | The range of Thermal Conductivity | wtd_range_ThermalConductivity | The weighted range of Thermal Conductivity | std_ThermalConductivity | The standard deviation of Thermal Conductivity | wtd_std_ThermalConductivity | The weighted standard Thermal Conductivity | mean_Valence | The average of Valence | wtd_mean_Valence | The weighted average of Valence | gmean_Valence | The g-average of Valence | wtd_gmean_Valence | The weighted g-average of Valence | entropy_Valence | The degree of disorder or uncertainty of Valence | wtd_entropy_Valence | The weighted degree of disorder or uncertainty of Valence | range_Valence | The range of Valence | wtd_range_Valence | The weghted range of Valence | std_Valence | The standard deviation of Valence | wtd_std_Valence | The standard deviation of Valence | |

Data Quality¶

In [5]:
data.head()
Out[5]:
number_of_elements mean_atomic_mass wtd_mean_atomic_mass gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass std_atomic_mass wtd_std_atomic_mass mean_fie wtd_mean_fie gmean_fie wtd_gmean_fie entropy_fie wtd_entropy_fie range_fie wtd_range_fie std_fie wtd_std_fie mean_atomic_radius wtd_mean_atomic_radius gmean_atomic_radius wtd_gmean_atomic_radius entropy_atomic_radius wtd_entropy_atomic_radius range_atomic_radius wtd_range_atomic_radius std_atomic_radius wtd_std_atomic_radius mean_Density wtd_mean_Density gmean_Density wtd_gmean_Density entropy_Density wtd_entropy_Density range_Density wtd_range_Density std_Density wtd_std_Density mean_ElectronAffinity wtd_mean_ElectronAffinity gmean_ElectronAffinity wtd_gmean_ElectronAffinity entropy_ElectronAffinity wtd_entropy_ElectronAffinity range_ElectronAffinity wtd_range_ElectronAffinity std_ElectronAffinity wtd_std_ElectronAffinity mean_FusionHeat wtd_mean_FusionHeat gmean_FusionHeat wtd_gmean_FusionHeat entropy_FusionHeat wtd_entropy_FusionHeat range_FusionHeat wtd_range_FusionHeat std_FusionHeat wtd_std_FusionHeat mean_ThermalConductivity wtd_mean_ThermalConductivity gmean_ThermalConductivity wtd_gmean_ThermalConductivity entropy_ThermalConductivity wtd_entropy_ThermalConductivity range_ThermalConductivity wtd_range_ThermalConductivity std_ThermalConductivity wtd_std_ThermalConductivity mean_Valence wtd_mean_Valence gmean_Valence wtd_gmean_Valence entropy_Valence wtd_entropy_Valence range_Valence wtd_range_Valence std_Valence wtd_std_Valence critical_temp
0 4 88.944468 57.862692 66.361592 36.116612 1.181795 1.062396 122.90607 31.794921 51.968828 53.622535 775.425 1010.268571 718.152900 938.016780 1.305967 0.791488 810.6 735.985714 323.811808 355.562967 160.25 105.514286 136.126003 84.528423 1.259244 1.207040 205 42.914286 75.237540 69.235569 4654.35725 2961.502286 724.953211 53.543811 1.033129 0.814598 8958.571 1579.583429 3306.162897 3572.596624 81.8375 111.727143 60.123179 99.414682 1.159687 0.787382 127.05 80.987143 51.433712 42.558396 6.9055 3.846857 3.479475 1.040986 1.088575 0.994998 12.878 1.744571 4.599064 4.666920 107.756645 61.015189 7.062488 0.621979 0.308148 0.262848 399.97342 57.127669 168.854244 138.517163 2.25 2.257143 2.213364 2.219783 1.368922 1.066221 1 1.085714 0.433013 0.437059 29.0
1 5 92.729214 58.518416 73.132787 36.396602 1.449309 1.057755 122.90607 36.161939 47.094633 53.979870 766.440 1010.612857 720.605511 938.745413 1.544145 0.807078 810.6 743.164286 290.183029 354.963511 161.20 104.971429 141.465215 84.370167 1.508328 1.204115 205 50.571429 67.321319 68.008817 5821.48580 3021.016571 1237.095080 54.095718 1.314442 0.914802 10488.571 1667.383429 3767.403176 3632.649185 90.8900 112.316429 69.833315 101.166398 1.427997 0.838666 127.05 81.207857 49.438167 41.667621 7.7844 3.796857 4.403790 1.035251 1.374977 1.073094 12.878 1.595714 4.473363 4.603000 172.205316 61.372331 16.064228 0.619735 0.847404 0.567706 429.97342 51.413383 198.554600 139.630922 2.00 2.257143 1.888175 2.210679 1.557113 1.047221 2 1.128571 0.632456 0.468606 26.0
2 4 88.944468 57.885242 66.361592 36.122509 1.181795 0.975980 122.90607 35.741099 51.968828 53.656268 775.425 1010.820000 718.152900 939.009036 1.305967 0.773620 810.6 743.164286 323.811808 354.804183 160.25 104.685714 136.126003 84.214573 1.259244 1.132547 205 49.314286 75.237540 67.797712 4654.35725 2999.159429 724.953211 53.974022 1.033129 0.760305 8958.571 1667.383429 3306.162897 3592.019281 81.8375 112.213571 60.123179 101.082152 1.159687 0.786007 127.05 81.207857 51.433712 41.639878 6.9055 3.822571 3.479475 1.037439 1.088575 0.927479 12.878 1.757143 4.599064 4.649635 107.756645 60.943760 7.062488 0.619095 0.308148 0.250477 399.97342 57.127669 168.854244 138.540613 2.25 2.271429 2.213364 2.232679 1.368922 1.029175 1 1.114286 0.433013 0.444697 19.0
3 4 88.944468 57.873967 66.361592 36.119560 1.181795 1.022291 122.90607 33.768010 51.968828 53.639405 775.425 1010.544286 718.152900 938.512777 1.305967 0.783207 810.6 739.575000 323.811808 355.183884 160.25 105.100000 136.126003 84.371352 1.259244 1.173033 205 46.114286 75.237540 68.521665 4654.35725 2980.330857 724.953211 53.758486 1.033129 0.788889 8958.571 1623.483429 3306.162897 3582.370597 81.8375 111.970357 60.123179 100.244950 1.159687 0.786900 127.05 81.097500 51.433712 42.102344 6.9055 3.834714 3.479475 1.039211 1.088575 0.964031 12.878 1.744571 4.599064 4.658301 107.756645 60.979474 7.062488 0.620535 0.308148 0.257045 399.97342 57.127669 168.854244 138.528893 2.25 2.264286 2.213364 2.226222 1.368922 1.048834 1 1.100000 0.433013 0.440952 22.0
4 4 88.944468 57.840143 66.361592 36.110716 1.181795 1.129224 122.90607 27.848743 51.968828 53.588771 775.425 1009.717143 718.152900 937.025573 1.305967 0.805230 810.6 728.807143 323.811808 356.319281 160.25 106.342857 136.126003 84.843442 1.259244 1.261194 205 36.514286 75.237540 70.634448 4654.35725 2923.845143 724.953211 53.117029 1.033129 0.859811 8958.571 1491.783429 3306.162897 3552.668664 81.8375 111.240714 60.123179 97.774719 1.159687 0.787396 127.05 80.766429 51.433712 43.452059 6.9055 3.871143 3.479475 1.044545 1.088575 1.044970 12.878 1.744571 4.599064 4.684014 107.756645 61.086617 7.062488 0.624878 0.308148 0.272820 399.97342 57.127669 168.854244 138.493671 2.25 2.242857 2.213364 2.206963 1.368922 1.096052 1 1.057143 0.433013 0.428809 23.0
In [6]:
unique_m.head()
Out[6]:
H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn critical_temp material
0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.20 1.80 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 29.0 Ba0.2La1.8Cu1O4
1 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.10 1.90 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 26.0 Ba0.1La1.9Ag0.1Cu0.9O4
2 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.10 1.90 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 19.0 Ba0.1La1.9Cu1O4
3 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.15 1.85 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 22.0 Ba0.15La1.85Cu1O4
4 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.30 1.70 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 23.0 Ba0.3La1.7Cu1O4
In [7]:
unique_m.shape
Out[7]:
(21263, 88)
In [8]:
data.shape
Out[8]:
(21263, 82)

Both files from the study have the same number of records, one contains the number of elements utilized while the other contains the significant statistical measures of various components of the superconductors.

Missing Values
The dataset contains no missing values. There is nothing for us to imputate or reshape.

For the purposes of regression on the Critical Temp, we may proceed with the analysis.

In [9]:
# Features with Null Values and Percent missing
null_df = pd.DataFrame(data[data.columns[data.isnull().any()]].isnull().sum()).reset_index()
null_df.columns = ['Feature', 'Value']
null_df['Percent'] = round((null_df['Value'] / data.shape[0] * 100),2)

null_df
Out[9]:
Feature Value Percent
In [10]:
# Features with Null Values and Percent missing
null_df = pd.DataFrame(unique_m[unique_m.columns[unique_m.isnull().any()]].isnull().sum()).reset_index()
null_df.columns = ['Feature', 'Value']
null_df['Percent'] = round((null_df['Value'] / unique_m.shape[0] * 100),2)

null_df
Out[10]:
Feature Value Percent

Duplicate Values
There are 66 duplicate values in the data set. No action was needed.

In [11]:
# Duplicate record validation
data.duplicated().sum()
Out[11]:
66
In [12]:
# Duplicate record validation
unique_m.duplicated().sum()
Out[12]:
0
In [13]:
### Examine uniquenss of response variable
is_unique = data['critical_temp'].nunique() == data.shape[0]

print(is_unique)
False

Join data files¶

In [14]:
merged_df = pd.merge(data, unique_m, left_index=True, right_index=True, how="left") #Thank you google
In [15]:
merged_df.head()
Out[15]:
number_of_elements mean_atomic_mass wtd_mean_atomic_mass gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass std_atomic_mass wtd_std_atomic_mass mean_fie wtd_mean_fie gmean_fie wtd_gmean_fie entropy_fie wtd_entropy_fie range_fie wtd_range_fie std_fie wtd_std_fie mean_atomic_radius wtd_mean_atomic_radius gmean_atomic_radius wtd_gmean_atomic_radius entropy_atomic_radius wtd_entropy_atomic_radius range_atomic_radius wtd_range_atomic_radius std_atomic_radius wtd_std_atomic_radius mean_Density wtd_mean_Density gmean_Density wtd_gmean_Density entropy_Density wtd_entropy_Density range_Density wtd_range_Density std_Density wtd_std_Density mean_ElectronAffinity wtd_mean_ElectronAffinity gmean_ElectronAffinity wtd_gmean_ElectronAffinity entropy_ElectronAffinity wtd_entropy_ElectronAffinity range_ElectronAffinity wtd_range_ElectronAffinity std_ElectronAffinity wtd_std_ElectronAffinity mean_FusionHeat wtd_mean_FusionHeat gmean_FusionHeat wtd_gmean_FusionHeat entropy_FusionHeat wtd_entropy_FusionHeat range_FusionHeat wtd_range_FusionHeat std_FusionHeat wtd_std_FusionHeat mean_ThermalConductivity wtd_mean_ThermalConductivity gmean_ThermalConductivity wtd_gmean_ThermalConductivity entropy_ThermalConductivity wtd_entropy_ThermalConductivity range_ThermalConductivity wtd_range_ThermalConductivity std_ThermalConductivity wtd_std_ThermalConductivity mean_Valence wtd_mean_Valence gmean_Valence wtd_gmean_Valence entropy_Valence wtd_entropy_Valence range_Valence wtd_range_Valence std_Valence wtd_std_Valence critical_temp_x H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn critical_temp_y material
0 4 88.944468 57.862692 66.361592 36.116612 1.181795 1.062396 122.90607 31.794921 51.968828 53.622535 775.425 1010.268571 718.152900 938.016780 1.305967 0.791488 810.6 735.985714 323.811808 355.562967 160.25 105.514286 136.126003 84.528423 1.259244 1.207040 205 42.914286 75.237540 69.235569 4654.35725 2961.502286 724.953211 53.543811 1.033129 0.814598 8958.571 1579.583429 3306.162897 3572.596624 81.8375 111.727143 60.123179 99.414682 1.159687 0.787382 127.05 80.987143 51.433712 42.558396 6.9055 3.846857 3.479475 1.040986 1.088575 0.994998 12.878 1.744571 4.599064 4.666920 107.756645 61.015189 7.062488 0.621979 0.308148 0.262848 399.97342 57.127669 168.854244 138.517163 2.25 2.257143 2.213364 2.219783 1.368922 1.066221 1 1.085714 0.433013 0.437059 29.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.20 1.80 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 29.0 Ba0.2La1.8Cu1O4
1 5 92.729214 58.518416 73.132787 36.396602 1.449309 1.057755 122.90607 36.161939 47.094633 53.979870 766.440 1010.612857 720.605511 938.745413 1.544145 0.807078 810.6 743.164286 290.183029 354.963511 161.20 104.971429 141.465215 84.370167 1.508328 1.204115 205 50.571429 67.321319 68.008817 5821.48580 3021.016571 1237.095080 54.095718 1.314442 0.914802 10488.571 1667.383429 3767.403176 3632.649185 90.8900 112.316429 69.833315 101.166398 1.427997 0.838666 127.05 81.207857 49.438167 41.667621 7.7844 3.796857 4.403790 1.035251 1.374977 1.073094 12.878 1.595714 4.473363 4.603000 172.205316 61.372331 16.064228 0.619735 0.847404 0.567706 429.97342 51.413383 198.554600 139.630922 2.00 2.257143 1.888175 2.210679 1.557113 1.047221 2 1.128571 0.632456 0.468606 26.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.10 1.90 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 26.0 Ba0.1La1.9Ag0.1Cu0.9O4
2 4 88.944468 57.885242 66.361592 36.122509 1.181795 0.975980 122.90607 35.741099 51.968828 53.656268 775.425 1010.820000 718.152900 939.009036 1.305967 0.773620 810.6 743.164286 323.811808 354.804183 160.25 104.685714 136.126003 84.214573 1.259244 1.132547 205 49.314286 75.237540 67.797712 4654.35725 2999.159429 724.953211 53.974022 1.033129 0.760305 8958.571 1667.383429 3306.162897 3592.019281 81.8375 112.213571 60.123179 101.082152 1.159687 0.786007 127.05 81.207857 51.433712 41.639878 6.9055 3.822571 3.479475 1.037439 1.088575 0.927479 12.878 1.757143 4.599064 4.649635 107.756645 60.943760 7.062488 0.619095 0.308148 0.250477 399.97342 57.127669 168.854244 138.540613 2.25 2.271429 2.213364 2.232679 1.368922 1.029175 1 1.114286 0.433013 0.444697 19.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.10 1.90 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 19.0 Ba0.1La1.9Cu1O4
3 4 88.944468 57.873967 66.361592 36.119560 1.181795 1.022291 122.90607 33.768010 51.968828 53.639405 775.425 1010.544286 718.152900 938.512777 1.305967 0.783207 810.6 739.575000 323.811808 355.183884 160.25 105.100000 136.126003 84.371352 1.259244 1.173033 205 46.114286 75.237540 68.521665 4654.35725 2980.330857 724.953211 53.758486 1.033129 0.788889 8958.571 1623.483429 3306.162897 3582.370597 81.8375 111.970357 60.123179 100.244950 1.159687 0.786900 127.05 81.097500 51.433712 42.102344 6.9055 3.834714 3.479475 1.039211 1.088575 0.964031 12.878 1.744571 4.599064 4.658301 107.756645 60.979474 7.062488 0.620535 0.308148 0.257045 399.97342 57.127669 168.854244 138.528893 2.25 2.264286 2.213364 2.226222 1.368922 1.048834 1 1.100000 0.433013 0.440952 22.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.15 1.85 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 22.0 Ba0.15La1.85Cu1O4
4 4 88.944468 57.840143 66.361592 36.110716 1.181795 1.129224 122.90607 27.848743 51.968828 53.588771 775.425 1009.717143 718.152900 937.025573 1.305967 0.805230 810.6 728.807143 323.811808 356.319281 160.25 106.342857 136.126003 84.843442 1.259244 1.261194 205 36.514286 75.237540 70.634448 4654.35725 2923.845143 724.953211 53.117029 1.033129 0.859811 8958.571 1491.783429 3306.162897 3552.668664 81.8375 111.240714 60.123179 97.774719 1.159687 0.787396 127.05 80.766429 51.433712 43.452059 6.9055 3.871143 3.479475 1.044545 1.088575 1.044970 12.878 1.744571 4.599064 4.684014 107.756645 61.086617 7.062488 0.624878 0.308148 0.272820 399.97342 57.127669 168.854244 138.493671 2.25 2.242857 2.213364 2.206963 1.368922 1.096052 1 1.057143 0.433013 0.428809 23.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.30 1.70 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 23.0 Ba0.3La1.7Cu1O4
In [16]:
# Check there is no cartisian
merged_df.shape
Out[16]:
(21263, 170)
In [17]:
# Drop the secondary critical temp from unique_m dataset
merged_df_final = merged_df.drop(['critical_temp_y','material'], axis=1)
In [18]:
# Check the secondary critical temp is removed
merged_df_final.head()
Out[18]:
number_of_elements mean_atomic_mass wtd_mean_atomic_mass gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass std_atomic_mass wtd_std_atomic_mass mean_fie wtd_mean_fie gmean_fie wtd_gmean_fie entropy_fie wtd_entropy_fie range_fie wtd_range_fie std_fie wtd_std_fie mean_atomic_radius wtd_mean_atomic_radius gmean_atomic_radius wtd_gmean_atomic_radius entropy_atomic_radius wtd_entropy_atomic_radius range_atomic_radius wtd_range_atomic_radius std_atomic_radius wtd_std_atomic_radius mean_Density wtd_mean_Density gmean_Density wtd_gmean_Density entropy_Density wtd_entropy_Density range_Density wtd_range_Density std_Density wtd_std_Density mean_ElectronAffinity wtd_mean_ElectronAffinity gmean_ElectronAffinity wtd_gmean_ElectronAffinity entropy_ElectronAffinity wtd_entropy_ElectronAffinity range_ElectronAffinity wtd_range_ElectronAffinity std_ElectronAffinity wtd_std_ElectronAffinity mean_FusionHeat wtd_mean_FusionHeat gmean_FusionHeat wtd_gmean_FusionHeat entropy_FusionHeat wtd_entropy_FusionHeat range_FusionHeat wtd_range_FusionHeat std_FusionHeat wtd_std_FusionHeat mean_ThermalConductivity wtd_mean_ThermalConductivity gmean_ThermalConductivity wtd_gmean_ThermalConductivity entropy_ThermalConductivity wtd_entropy_ThermalConductivity range_ThermalConductivity wtd_range_ThermalConductivity std_ThermalConductivity wtd_std_ThermalConductivity mean_Valence wtd_mean_Valence gmean_Valence wtd_gmean_Valence entropy_Valence wtd_entropy_Valence range_Valence wtd_range_Valence std_Valence wtd_std_Valence critical_temp_x H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn
0 4 88.944468 57.862692 66.361592 36.116612 1.181795 1.062396 122.90607 31.794921 51.968828 53.622535 775.425 1010.268571 718.152900 938.016780 1.305967 0.791488 810.6 735.985714 323.811808 355.562967 160.25 105.514286 136.126003 84.528423 1.259244 1.207040 205 42.914286 75.237540 69.235569 4654.35725 2961.502286 724.953211 53.543811 1.033129 0.814598 8958.571 1579.583429 3306.162897 3572.596624 81.8375 111.727143 60.123179 99.414682 1.159687 0.787382 127.05 80.987143 51.433712 42.558396 6.9055 3.846857 3.479475 1.040986 1.088575 0.994998 12.878 1.744571 4.599064 4.666920 107.756645 61.015189 7.062488 0.621979 0.308148 0.262848 399.97342 57.127669 168.854244 138.517163 2.25 2.257143 2.213364 2.219783 1.368922 1.066221 1 1.085714 0.433013 0.437059 29.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.20 1.80 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
1 5 92.729214 58.518416 73.132787 36.396602 1.449309 1.057755 122.90607 36.161939 47.094633 53.979870 766.440 1010.612857 720.605511 938.745413 1.544145 0.807078 810.6 743.164286 290.183029 354.963511 161.20 104.971429 141.465215 84.370167 1.508328 1.204115 205 50.571429 67.321319 68.008817 5821.48580 3021.016571 1237.095080 54.095718 1.314442 0.914802 10488.571 1667.383429 3767.403176 3632.649185 90.8900 112.316429 69.833315 101.166398 1.427997 0.838666 127.05 81.207857 49.438167 41.667621 7.7844 3.796857 4.403790 1.035251 1.374977 1.073094 12.878 1.595714 4.473363 4.603000 172.205316 61.372331 16.064228 0.619735 0.847404 0.567706 429.97342 51.413383 198.554600 139.630922 2.00 2.257143 1.888175 2.210679 1.557113 1.047221 2 1.128571 0.632456 0.468606 26.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.10 1.90 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
2 4 88.944468 57.885242 66.361592 36.122509 1.181795 0.975980 122.90607 35.741099 51.968828 53.656268 775.425 1010.820000 718.152900 939.009036 1.305967 0.773620 810.6 743.164286 323.811808 354.804183 160.25 104.685714 136.126003 84.214573 1.259244 1.132547 205 49.314286 75.237540 67.797712 4654.35725 2999.159429 724.953211 53.974022 1.033129 0.760305 8958.571 1667.383429 3306.162897 3592.019281 81.8375 112.213571 60.123179 101.082152 1.159687 0.786007 127.05 81.207857 51.433712 41.639878 6.9055 3.822571 3.479475 1.037439 1.088575 0.927479 12.878 1.757143 4.599064 4.649635 107.756645 60.943760 7.062488 0.619095 0.308148 0.250477 399.97342 57.127669 168.854244 138.540613 2.25 2.271429 2.213364 2.232679 1.368922 1.029175 1 1.114286 0.433013 0.444697 19.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.10 1.90 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
3 4 88.944468 57.873967 66.361592 36.119560 1.181795 1.022291 122.90607 33.768010 51.968828 53.639405 775.425 1010.544286 718.152900 938.512777 1.305967 0.783207 810.6 739.575000 323.811808 355.183884 160.25 105.100000 136.126003 84.371352 1.259244 1.173033 205 46.114286 75.237540 68.521665 4654.35725 2980.330857 724.953211 53.758486 1.033129 0.788889 8958.571 1623.483429 3306.162897 3582.370597 81.8375 111.970357 60.123179 100.244950 1.159687 0.786900 127.05 81.097500 51.433712 42.102344 6.9055 3.834714 3.479475 1.039211 1.088575 0.964031 12.878 1.744571 4.599064 4.658301 107.756645 60.979474 7.062488 0.620535 0.308148 0.257045 399.97342 57.127669 168.854244 138.528893 2.25 2.264286 2.213364 2.226222 1.368922 1.048834 1 1.100000 0.433013 0.440952 22.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.15 1.85 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0
4 4 88.944468 57.840143 66.361592 36.110716 1.181795 1.129224 122.90607 27.848743 51.968828 53.588771 775.425 1009.717143 718.152900 937.025573 1.305967 0.805230 810.6 728.807143 323.811808 356.319281 160.25 106.342857 136.126003 84.843442 1.259244 1.261194 205 36.514286 75.237540 70.634448 4654.35725 2923.845143 724.953211 53.117029 1.033129 0.859811 8958.571 1491.783429 3306.162897 3552.668664 81.8375 111.240714 60.123179 97.774719 1.159687 0.787396 127.05 80.766429 51.433712 43.452059 6.9055 3.871143 3.479475 1.044545 1.088575 1.044970 12.878 1.744571 4.599064 4.684014 107.756645 61.086617 7.062488 0.624878 0.308148 0.272820 399.97342 57.127669 168.854244 138.493671 2.25 2.242857 2.213364 2.206963 1.368922 1.096052 1 1.057143 0.433013 0.428809 23.0 0.0 0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0.0 0.30 1.70 0.0 0.0 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0

Data Type Conversion
In this section we grouped all features by their correct data type and converted each to their coresponding group. This facilitates a much easier analysis into the statistics of each feature type.

In [19]:
# Features grouped by data type
cat_features = ['number_of_elements','H','He','Li','Be','B','C','N','O','F','Ne','Na','Mg','Al','Si','P','S','Cl','Ar','K',
                'Ca','Sc','Ti','V','Cr','Mn','Fe','Co','Ni','Cu','Zn','Ga','Ge','As','Se','Br','Kr','Rb','Sr','Y','Zr','Nb',
                'Mo','Tc','Ru','Rh','Pd','Ag','Cd','In','Sn','Sb','Te','I','Xe','Cs','Ba','La','Ce','Pr','Nd','Pm','Sm','Eu',
                'Gd','Tb','Dy','Ho','Er','Tm','Yb','Lu','Hf','Ta','W','Re','Os','Ir','Pt','Au','Hg','Tl','Pb','Bi','Po','At',
                'Rn']
cont_features = ['mean_atomic_mass','wtd_mean_atomic_mass','gmean_atomic_mass','wtd_gmean_atomic_mass',
                 'entropy_atomic_mass','wtd_entropy_atomic_mass','range_atomic_mass','wtd_range_atomic_mass','std_atomic_mass',
                 'wtd_std_atomic_mass','mean_fie', 'wtd_mean_fie','gmean_fie','wtd_gmean_fie','entropy_fie','wtd_entropy_fie',
                 'range_fie','wtd_range_fie','std_fie','wtd_std_fie','mean_atomic_radius','wtd_mean_atomic_radius',
                 'gmean_atomic_radius','wtd_gmean_atomic_radius','entropy_atomic_radius','wtd_entropy_atomic_radius',
                 'range_atomic_radius','wtd_range_atomic_radius','std_atomic_radius','wtd_std_atomic_radius','mean_Density',
                 'wtd_mean_Density','gmean_Density','wtd_gmean_Density','entropy_Density','wtd_entropy_Density','range_Density',
                 'wtd_range_Density','std_Density','wtd_std_Density','mean_ElectronAffinity','wtd_mean_ElectronAffinity',
                 'gmean_ElectronAffinity','wtd_gmean_ElectronAffinity','entropy_ElectronAffinity','wtd_entropy_ElectronAffinity',
                 'range_ElectronAffinity','wtd_range_ElectronAffinity','std_ElectronAffinity','wtd_std_ElectronAffinity',
                 'mean_FusionHeat','wtd_mean_FusionHeat','gmean_FusionHeat','wtd_gmean_FusionHeat','entropy_FusionHeat',
                 'wtd_entropy_FusionHeat','range_FusionHeat','wtd_range_FusionHeat','std_FusionHeat','wtd_std_FusionHeat',
                 'mean_ThermalConductivity','wtd_mean_ThermalConductivity','gmean_ThermalConductivity',
                 'wtd_gmean_ThermalConductivity','entropy_ThermalConductivity','wtd_entropy_ThermalConductivity',
                 'range_ThermalConductivity','wtd_range_ThermalConductivity','std_ThermalConductivity',
                 'wtd_std_ThermalConductivity','mean_Valence','wtd_mean_Valence','gmean_Valence','wtd_gmean_Valence',
                 'entropy_Valence','wtd_entropy_Valence','range_Valence','wtd_range_Valence','std_Valence','wtd_std_Valence']

Outliers

In [20]:
# Histogram of Critical Temperature
fig = px.histogram(merged_df_final, x="critical_temp_x", nbins = 20)
fig.show()

The most interesting detail of note from this table is the overall variation in the distribution of critical temperature. Noting that the range for when a superconductor goes critical can vary greatly between the low of 1 degree celeicus to a high of 144 degrees celicus. In the little over 21,000 observations from the study, only 143 (0.67%) of superconductors reached critical temperatre at 80 degrees celiecus. There does appear to be some outliers in the data as the histogram reveals a right tailed distribution, we may want to logrithmically scale the data to standardize our analysis.

In [21]:
# Box Plot - Critical Temp by Number of Elements (string)
fig = px.box(merged_df_final[merged_df_final['critical_temp_x']==True], x='number_of_elements',
             width=800, height=400, title='Box Plot -Number of Elements')
fig.show()

The above box plot indicates that during critical temperature the number of elements is 3 or below with an upper bound of 4 elements contained in the superconductor. There is a degree of outliers at 6 elements contained with the data does exist but the vast majoritiy of elements is three.

Simple Statistics¶

In [22]:
summary_stats = merged_df_final.describe()

summary_stats_tb = summary_stats.transpose()

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:2f}'.format) # Format numeric values
print(summary_stats_tb)
                             count      mean       std      min       25%  \
number_of_elements    21263.000000  4.115224  1.439295 1.000000  3.000000   
mean_atomic_mass      21263.000000 87.557631 29.676497 6.941000 72.458076   
wtd_mean_atomic_mass  21263.000000 72.988310 33.490406 6.423452 52.143839   
gmean_atomic_mass     21263.000000 71.290627 31.030272 5.320573 58.041225   
wtd_gmean_atomic_mass 21263.000000 58.539916 36.651067 1.960849 35.248990   
...                            ...       ...       ...      ...       ...   
Pb                    21263.000000  0.042461  0.274365 0.000000  0.000000   
Bi                    21263.000000  0.201009  0.655927 0.000000  0.000000   
Po                    21263.000000  0.000000  0.000000 0.000000  0.000000   
At                    21263.000000  0.000000  0.000000 0.000000  0.000000   
Rn                    21263.000000  0.000000  0.000000 0.000000  0.000000   

                            50%        75%        max  
number_of_elements     4.000000   5.000000   9.000000  
mean_atomic_mass      84.922750 100.404410 208.980400  
wtd_mean_atomic_mass  60.696571  86.103540 208.980400  
gmean_atomic_mass     66.361592  78.116681 208.980400  
wtd_gmean_atomic_mass 39.918385  73.113234 208.980400  
...                         ...        ...        ...  
Pb                     0.000000   0.000000  19.000000  
Bi                     0.000000   0.000000  14.000000  
Po                     0.000000   0.000000   0.000000  
At                     0.000000   0.000000   0.000000  
Rn                     0.000000   0.000000   0.000000  

[168 rows x 8 columns]
In [23]:
summary_stats_m = unique_m.describe()

summary_stats_tb_m = summary_stats_m.transpose()

pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', '{:2f}'.format) # Format numeric values
print(summary_stats_tb_m)
                     count      mean       std      min      25%       50%  \
H             21263.000000  0.017685  0.267220 0.000000 0.000000  0.000000   
He            21263.000000  0.000000  0.000000 0.000000 0.000000  0.000000   
Li            21263.000000  0.012125  0.129552 0.000000 0.000000  0.000000   
Be            21263.000000  0.034638  0.848541 0.000000 0.000000  0.000000   
B             21263.000000  0.142594  1.044486 0.000000 0.000000  0.000000   
...                    ...       ...       ...      ...      ...       ...   
Bi            21263.000000  0.201009  0.655927 0.000000 0.000000  0.000000   
Po            21263.000000  0.000000  0.000000 0.000000 0.000000  0.000000   
At            21263.000000  0.000000  0.000000 0.000000 0.000000  0.000000   
Rn            21263.000000  0.000000  0.000000 0.000000 0.000000  0.000000   
critical_temp 21263.000000 34.421219 34.254362 0.000210 5.365000 20.000000   

                    75%        max  
H              0.000000  14.000000  
He             0.000000   0.000000  
Li             0.000000   3.000000  
Be             0.000000  40.000000  
B              0.000000 105.000000  
...                 ...        ...  
Bi             0.000000  14.000000  
Po             0.000000   0.000000  
At             0.000000   0.000000  
Rn             0.000000   0.000000  
critical_temp 63.000000 185.000000  

[87 rows x 8 columns]
In [24]:
# Cancellation Frequency
crit_temp_df = pd.DataFrame(merged_df_final['critical_temp_x'].value_counts()).reset_index()
crit_temp_df.columns = ['critical_temp', 'Count']
crit_temp_df['Frequency'] = round(crit_temp_df['Count'] / sum(crit_temp_df['Count']) * 100, 2)
crit_temp_df
Out[24]:
critical_temp Count Frequency
0 80.000000 143 0.670000
1 20.000000 129 0.610000
2 30.000000 125 0.590000
3 90.000000 122 0.570000
4 40.000000 111 0.520000
... ... ... ...
3002 6.170000 1 0.000000
3003 4.345000 1 0.000000
3004 20.460000 1 0.000000
3005 19.090000 1 0.000000
3006 122.100000 1 0.000000

3007 rows × 3 columns

Visualize Attributes¶

Below we see a histogram of the Average Atomic Mass between the superconductor temperatures achiecving critical mass. We see the average atomic mass rapidly rise as we hit the 72 - 78 marker then drop signficantly at around 80 only to rise again at around 88. A gradual decline with many peaks and valleys until the 144 mark where the peaks begin to become noticeably smaller in size and frequency. The bulk of the data as an average of 84.9 atomic mass with a range between 70 - 90.

In [25]:
# Histogram of Atomic mass
fig = px.histogram(merged_df_final, x='mean_atomic_mass', marginal="box",width=800, height=400, 
                   title='Distribution Plot - Average Atomic Mass')
fig.show()

Below we see a Histogram of weighted thermal conductivity. This histogram has a similar shape to that of the distrbution histogram for the average atomic mass. The histogram of thermal conductivity has more of a right tailled skewness but a simliar peaks and valleys to the atomic mass above. We can note rapid growth when thermal conductivity with the vast majority of study indicating a massive peak between a 60 - 62 thermal condicutivty for superconductors with an average of 73.3. It's good to note that the abilitiy of a material to move heat quickly and efficiently, materials with a high thermal conductivity can transfer heat rapidly from one location ot another.

In [26]:
# Histogram by Weighted Thermal Conductivity
fig = px.histogram(merged_df_final, x='wtd_mean_ThermalConductivity', 
                   marginal="box",width=800, height=400, title='Distribution Plot - Thermal Conductivity')
fig.show()

Explore Correlations¶

In [27]:
# First step to explore any relationships between data would be to do a correlation
merged_df_final.corr()
Out[27]:
number_of_elements mean_atomic_mass wtd_mean_atomic_mass gmean_atomic_mass wtd_gmean_atomic_mass entropy_atomic_mass wtd_entropy_atomic_mass range_atomic_mass wtd_range_atomic_mass std_atomic_mass wtd_std_atomic_mass mean_fie wtd_mean_fie gmean_fie wtd_gmean_fie entropy_fie wtd_entropy_fie range_fie wtd_range_fie std_fie wtd_std_fie mean_atomic_radius wtd_mean_atomic_radius gmean_atomic_radius wtd_gmean_atomic_radius entropy_atomic_radius wtd_entropy_atomic_radius range_atomic_radius wtd_range_atomic_radius std_atomic_radius wtd_std_atomic_radius mean_Density wtd_mean_Density gmean_Density wtd_gmean_Density entropy_Density wtd_entropy_Density range_Density wtd_range_Density std_Density wtd_std_Density mean_ElectronAffinity wtd_mean_ElectronAffinity gmean_ElectronAffinity wtd_gmean_ElectronAffinity entropy_ElectronAffinity wtd_entropy_ElectronAffinity range_ElectronAffinity wtd_range_ElectronAffinity std_ElectronAffinity wtd_std_ElectronAffinity mean_FusionHeat wtd_mean_FusionHeat gmean_FusionHeat wtd_gmean_FusionHeat entropy_FusionHeat wtd_entropy_FusionHeat range_FusionHeat wtd_range_FusionHeat std_FusionHeat wtd_std_FusionHeat mean_ThermalConductivity wtd_mean_ThermalConductivity gmean_ThermalConductivity wtd_gmean_ThermalConductivity entropy_ThermalConductivity wtd_entropy_ThermalConductivity range_ThermalConductivity wtd_range_ThermalConductivity std_ThermalConductivity wtd_std_ThermalConductivity mean_Valence wtd_mean_Valence gmean_Valence wtd_gmean_Valence entropy_Valence wtd_entropy_Valence range_Valence wtd_range_Valence std_Valence wtd_std_Valence critical_temp_x H He Li Be B C N O F Ne Na Mg Al Si P S Cl Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag Cd In Sn Sb Te I Xe Cs Ba La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm Yb Lu Hf Ta W Re Os Ir Pt Au Hg Tl Pb Bi Po At Rn
number_of_elements 1.000000 -0.141923 -0.353064 -0.292969 -0.454525 0.939304 0.881845 0.682777 -0.320293 0.513998 0.546391 0.167451 0.484445 0.024229 0.424152 0.973195 0.719209 0.781227 0.329624 0.674005 0.717831 -0.001389 -0.422144 -0.240444 -0.518256 0.972245 0.904121 0.768060 -0.371350 0.624810 0.695089 -0.418675 -0.507895 -0.630504 -0.649882 0.871832 0.767078 0.413486 -0.355389 0.210724 0.334072 -0.119303 0.195608 -0.356067 -0.052884 0.877304 0.625798 0.531540 0.241411 0.423738 0.480813 -0.437624 -0.449272 -0.514252 -0.519109 0.900759 0.860479 0.005734 -0.371788 -0.113361 -0.074796 0.227656 0.206069 -0.485324 -0.469206 0.501871 0.207065 0.696060 0.316772 0.602018 0.665580 -0.609412 -0.648551 -0.618512 -0.659268 0.967832 0.892559 0.231874 -0.447770 0.105365 0.035216 0.601069 0.000950 NaN -0.033713 -0.039801 -0.069869 -0.076534 -0.046001 0.541654 0.083068 NaN -0.016082 -0.062128 -0.044387 -0.069016 -0.028974 -0.058567 0.015625 NaN -0.048214 0.260772 -0.027056 -0.042975 -0.065039 -0.016153 -0.013550 0.023825 -0.029350 -0.034140 0.416454 -0.008316 -0.062019 -0.053670 0.028742 -0.087924 -0.002040 NaN -0.051874 0.475223 0.165323 -0.092858 -0.075090 -0.062211 -0.055049 -0.041788 -0.059083 -0.048377 -0.023183 -0.009915 -0.053554 -0.055459 -0.038986 -0.048162 0.018590 NaN -0.030937 0.288504 -0.006440 0.057134 0.010574 0.086448 NaN 0.059387 0.092002 0.131634 -0.004529 0.033270 0.025838 0.038728 0.002195 0.013471 -0.022754 -0.040189 -0.042999 -0.053299 -0.042371 -0.056916 -0.049955 -0.036926 -0.027695 0.116987 0.113393 0.064947 0.249738 NaN NaN NaN
mean_atomic_mass -0.141923 1.000000 0.815977 0.940298 0.745841 -0.104000 -0.097609 0.125659 0.446225 0.196460 0.130675 -0.285782 -0.222097 -0.240565 -0.219381 -0.166935 -0.163565 -0.255628 -0.080545 -0.276561 -0.222812 0.497664 0.376760 0.561061 0.359894 -0.140034 -0.147604 -0.270695 0.141100 -0.326403 -0.280440 0.756861 0.608935 0.596485 0.525588 -0.043416 0.026325 0.198067 0.342391 0.245042 0.180943 0.088230 0.061103 0.189282 0.134382 -0.091539 -0.107651 -0.187069 0.010235 -0.164960 -0.133101 -0.137669 -0.135429 0.014818 -0.043003 -0.008499 -0.028541 -0.347582 -0.167528 -0.337969 -0.335778 -0.158266 -0.065989 0.006004 0.056394 -0.100077 -0.098221 -0.114538 -0.027790 -0.110658 -0.110856 0.374099 0.304683 0.392153 0.321399 -0.156786 -0.145610 -0.107450 0.168633 -0.080279 -0.081253 -0.113523 -0.111504 NaN -0.130059 -0.021678 -0.120888 -0.115583 -0.111170 -0.093619 -0.069942 NaN -0.115777 -0.155525 -0.045626 -0.018588 -0.021524 -0.006956 -0.112718 NaN -0.088519 -0.041040 -0.014483 -0.026578 -0.052516 -0.010145 -0.016231 -0.076634 -0.020210 -0.066038 -0.070633 0.020962 -0.031919 0.049963 -0.022760 0.028460 -0.043488 NaN -0.055412 -0.003923 -0.126102 -0.008436 -0.029126 -0.004621 0.009937 0.017572 0.036436 0.016032 -0.014841 0.001028 0.106627 0.054920 0.056644 0.084814 0.001330 NaN -0.000976 -0.044394 0.056738 0.083521 0.007759 0.010153 NaN -0.003401 0.033260 0.030458 0.028056 0.020760 0.016547 0.032158 0.028773 0.038125 0.083665 0.045299 0.061727 0.053075 0.063347 0.136228 0.089754 0.107023 0.055121 0.101282 0.113666 0.197505 0.121567 NaN NaN NaN
wtd_mean_atomic_mass -0.353064 0.815977 1.000000 0.848242 0.964085 -0.308046 -0.412666 -0.144029 0.716623 -0.060739 -0.089471 -0.209296 -0.522595 -0.109490 -0.508109 -0.369773 -0.129779 -0.452303 -0.420457 -0.459323 -0.492250 0.288451 0.660011 0.468457 0.667112 -0.345071 -0.400483 -0.524861 0.363882 -0.551141 -0.554820 0.749261 0.842665 0.712815 0.767011 -0.246377 -0.195894 -0.002868 0.585687 0.103157 0.009921 0.147303 -0.096427 0.272261 0.021877 -0.290220 -0.093796 -0.225890 -0.204480 -0.197729 -0.210757 0.006730 0.014681 0.164239 0.120044 -0.225287 -0.237218 -0.283420 -0.070411 -0.253911 -0.272806 -0.236418 -0.058075 0.184990 0.250226 -0.076936 0.025638 -0.376573 -0.108512 -0.362512 -0.350993 0.534450 0.545587 0.539780 0.548981 -0.375718 -0.331025 -0.039155 0.330904 -0.003681 0.077323 -0.312272 -0.087297 NaN -0.064887 -0.051196 -0.109217 -0.136420 -0.062879 -0.416796 -0.027159 NaN -0.090717 -0.102286 -0.033009 -0.029212 -0.011657 -0.013994 -0.067755 NaN -0.071718 -0.080540 -0.002800 -0.020755 -0.032313 -0.004656 0.000862 -0.018524 0.008220 -0.024193 -0.229718 -0.001851 -0.000182 0.047000 0.000950 0.052034 -0.020349 NaN -0.071687 -0.092916 -0.174097 0.028848 0.034874 0.019095 0.025377 0.027995 0.044557 0.050368 0.005236 0.009345 0.124607 0.077407 0.078442 0.096987 0.016405 NaN -0.033085 -0.220160 0.043506 0.074608 0.010219 -0.039916 NaN -0.019019 -0.022564 -0.035797 0.015752 -0.017355 -0.020333 -0.010473 0.009757 0.023149 0.053253 0.052581 0.068113 0.064520 0.080614 0.124673 0.098004 0.111642 0.067790 0.037561 0.056807 0.169794 0.115858 NaN NaN NaN
gmean_atomic_mass -0.292969 0.940298 0.848242 1.000000 0.856975 -0.190214 -0.232183 -0.175861 0.458473 -0.121708 -0.166042 -0.367690 -0.354664 -0.286844 -0.341585 -0.316670 -0.287701 -0.431689 -0.155439 -0.450045 -0.390843 0.510867 0.488822 0.647560 0.496461 -0.282048 -0.311701 -0.460197 0.240296 -0.512841 -0.462397 0.779757 0.677131 0.728477 0.663642 -0.125672 -0.093881 -0.024975 0.368143 0.037866 -0.037299 0.079376 -0.006353 0.219651 0.111858 -0.238002 -0.224757 -0.284098 -0.055316 -0.249926 -0.238028 -0.092244 -0.089139 0.086599 0.024199 -0.126798 -0.171928 -0.384838 -0.128758 -0.361244 -0.372666 -0.190937 -0.104940 0.110769 0.131642 -0.116455 -0.104644 -0.243465 -0.095661 -0.233587 -0.232079 0.487021 0.427961 0.511508 0.450357 -0.306246 -0.307662 -0.165010 0.272303 -0.124627 -0.117336 -0.230345 -0.119243 NaN -0.138087 -0.041760 -0.145190 -0.114904 -0.109134 -0.193470 -0.086346 NaN -0.100417 -0.136540 -0.032450 -0.021309 -0.014900 -0.017244 -0.089040 NaN -0.081515 -0.078359 -0.012664 -0.009310 -0.024013 -0.003682 -0.005250 -0.033847 -0.001253 -0.053141 -0.132853 0.025385 -0.002493 0.068915 0.000392 0.065670 -0.037552 NaN -0.052847 -0.072322 -0.115229 0.025199 -0.000178 0.009791 0.024807 0.033432 0.043205 0.033350 -0.010431 0.003324 0.137184 0.076803 0.076529 0.103842 -0.003148 NaN -0.011229 -0.111109 0.063909 0.087015 0.003674 -0.025485 NaN -0.025630 0.016533 0.003667 0.022969 -0.005685 -0.014925 0.005598 0.017337 0.026523 0.052581 0.043650 0.063578 0.018147 0.062554 0.140039 0.074542 0.091775 0.056722 0.032206 0.060171 0.169211 0.034077 NaN NaN NaN
wtd_gmean_atomic_mass -0.454525 0.745841 0.964085 0.856975 1.000000 -0.370561 -0.484664 -0.352093 0.673326 -0.274487 -0.331657 -0.276668 -0.612317 -0.154323 -0.588014 -0.471280 -0.227652 -0.575369 -0.451326 -0.578719 -0.617363 0.301508 0.720901 0.527074 0.749593 -0.441916 -0.514618 -0.645663 0.432896 -0.665166 -0.681130 0.740131 0.852608 0.789208 0.843708 -0.300078 -0.273122 -0.163939 0.576836 -0.048110 -0.174098 0.119314 -0.158608 0.274209 -0.011941 -0.395866 -0.194127 -0.300521 -0.246388 -0.262822 -0.291153 0.054752 0.072658 0.219751 0.189353 -0.313735 -0.349844 -0.279300 -0.006009 -0.239074 -0.277760 -0.248849 -0.056793 0.271083 0.322335 -0.076157 0.020495 -0.464856 -0.129212 -0.447236 -0.431027 0.599413 0.614100 0.608417 0.623261 -0.477785 -0.448072 -0.078641 0.409674 -0.033313 0.030361 -0.369858 -0.081418 NaN -0.077289 -0.042417 -0.114284 -0.101973 -0.062112 -0.475707 -0.039707 NaN -0.064887 -0.085839 -0.017499 -0.023541 -0.004654 -0.017157 -0.049457 NaN -0.056773 -0.080323 -0.000193 -0.003153 -0.007198 0.000615 0.008567 0.021970 0.022549 -0.011455 -0.257135 0.004203 0.022255 0.060508 0.028005 0.081196 -0.017422 NaN -0.050966 -0.136241 -0.166371 0.051628 0.058673 0.032422 0.035285 0.038298 0.047940 0.061469 0.006435 0.011334 0.144345 0.091147 0.090722 0.106489 0.007775 NaN -0.020512 -0.249967 0.040336 0.072340 0.008961 -0.068187 NaN -0.036996 -0.030590 -0.049162 0.015089 -0.030796 -0.036411 -0.025176 0.003907 0.018494 0.022324 0.048340 0.066058 0.038396 0.078045 0.121542 0.074275 0.092166 0.066047 -0.006049 0.019589 0.148454 0.029298 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Pb 0.064947 0.197505 0.169794 0.169211 0.148454 0.046266 0.048743 0.101324 0.101449 0.103382 0.070390 -0.043793 -0.038166 -0.032042 -0.035478 0.047307 0.053384 -0.020137 -0.029629 -0.039860 -0.028675 -0.002635 0.019809 0.023962 0.028374 0.053665 0.042746 -0.038289 -0.015530 -0.066456 -0.051666 0.063647 0.056537 0.043781 0.044145 0.062639 0.081242 0.021400 0.023113 0.017309 0.007919 -0.057314 -0.043074 -0.052592 -0.061273 0.035930 0.023695 -0.001974 -0.029124 -0.009704 0.018760 -0.083310 -0.070968 -0.070125 -0.064741 0.056560 0.060450 -0.052320 -0.059616 -0.054728 -0.053379 -0.034022 -0.007171 -0.026512 -0.026505 0.070847 0.049838 0.005455 -0.013186 -0.011324 0.003938 0.019358 0.012676 0.018126 0.013143 0.041077 0.050627 0.032337 -0.005013 0.028546 0.023821 0.016864 -0.007610 NaN -0.013029 -0.006293 -0.020985 -0.012727 -0.012129 -0.005860 -0.003560 NaN -0.006636 -0.012837 -0.008287 -0.013228 -0.008855 0.053905 -0.011700 NaN -0.015719 0.055194 -0.008142 -0.008092 -0.009833 -0.001828 -0.002963 -0.033166 -0.009377 -0.014192 0.008438 -0.005043 -0.009751 -0.012500 -0.021840 0.051451 -0.007267 NaN -0.009955 0.135993 -0.026217 -0.011208 -0.013605 0.013636 -0.005478 -0.011000 0.018651 -0.006152 0.030714 -0.000280 -0.009717 -0.008342 -0.007892 0.009317 -0.008105 NaN 0.000535 -0.077901 -0.006656 -0.015474 -0.003790 -0.016002 NaN -0.010050 0.015950 -0.016513 -0.001295 -0.007175 0.000810 -0.008278 -0.007613 -0.000982 -0.013492 -0.006790 -0.003737 -0.009643 -0.005022 -0.012343 -0.011016 -0.014156 0.000110 -0.002473 0.008468 1.000000 0.067402 NaN NaN NaN
Bi 0.249738 0.121567 0.115858 0.034077 0.029298 0.167009 0.155360 0.381924 0.088625 0.341030 0.371978 0.028473 0.048413 0.014709 0.041231 0.218048 0.234481 0.148581 -0.024473 0.105363 0.112067 -0.110631 -0.092406 -0.113207 -0.084334 0.223312 0.241132 0.090336 -0.152969 0.033068 0.041451 -0.073299 -0.073785 -0.129492 -0.118404 0.165771 0.221612 0.067845 -0.101047 0.062044 0.082554 0.038165 0.059036 -0.046882 -0.037509 0.207058 0.210033 0.105201 -0.001571 0.073922 0.104452 -0.161109 -0.150656 -0.148619 -0.145667 0.241057 0.294931 -0.111558 -0.148439 -0.128617 -0.125032 0.037179 -0.026681 -0.145483 -0.129927 0.100508 0.178005 0.154591 -0.034808 0.124353 0.102590 -0.059411 -0.061718 -0.086916 -0.089660 0.186299 0.215864 0.274992 -0.086750 0.248179 0.320986 0.162499 -0.017418 NaN -0.019683 -0.012510 -0.041839 -0.025968 -0.024379 0.171170 0.028434 NaN -0.016969 -0.027832 -0.016781 -0.026202 -0.018373 0.036544 -0.021419 NaN 0.013228 0.235104 -0.017908 -0.017290 -0.019883 -0.005678 -0.004555 -0.064186 -0.017051 -0.023564 0.081941 -0.008636 -0.019566 -0.024759 -0.044066 -0.003520 -0.014390 NaN -0.015667 0.535282 -0.093645 -0.023403 -0.027769 -0.021246 -0.010847 -0.021787 -0.012459 -0.015026 -0.006477 -0.003714 -0.017900 -0.019085 -0.016092 0.015407 0.047880 NaN 0.007103 -0.166755 -0.025539 -0.019751 -0.006792 -0.025352 NaN -0.026261 -0.009927 -0.031803 -0.009164 -0.022177 -0.015801 -0.024034 -0.005980 -0.014690 -0.028373 -0.013137 -0.012258 -0.019147 -0.009822 -0.024442 -0.021150 -0.031599 -0.007015 -0.049317 -0.048545 0.067402 1.000000 NaN NaN NaN
Po NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
At NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Rn NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

168 rows × 168 columns

In [28]:
# calculate the correlations
correlations = merged_df_final.corr()

# list of all features except the target
all_features = merged_df_final.columns.tolist()
target_variable = 'critical_temp_x'
all_features.remove(target_variable)

#Define the number of features per group
features_per_group = 4
total_features = len(all_features)
total_groups = (total_features + features_per_group - 1) // features_per_group

# Iterate through groups
for group_num in range(total_groups):
    start_idx = group_num * features_per_group
    end_idx = min((group_num + 1) * features_per_group, total_features)
    selected_features = all_features[start_idx:end_idx]
    
    #Create a new figure for each group 
    plt.figure(figsize=(15,10))
    
    for idx, feature in enumerate(selected_features):
        plt.subplot(2, 4, idx + 1)
        sns.scatterplot(data=merged_df_final, x=feature, y=target_variable)    
        
        # Calcualte correlation coefficient
        corr_coeff = correlations.loc[feature, target_variable]
        
        #Annoate twith correlation coefficients
        plt.text(0.5, 0.9, f'Corr: {corr_coeff: .2f}', horizontalalignment='center',
                verticalalignment='center',transform=plt.gca().transAxes, fontsize=10)
        
        plt.title(f"Scatter Plot: {target_variable} vs {feature}", fontsize = 9) #adjust title size here
        
    plt.tight_layout()
    plt.subplots_adjust(top=0.9) #Adjust top spacing for the overall title
    plt.suptitle(f"Correlation Plots - Group {group_num+1}", fontsize = 16) # Overalltitle
    plt.show()        

The correlation plot grants insight into what features might have some type of relationship among our response variable critical temp. As we previously explored, there is a positive correlation between the number of elements and the critical temperature. We can note that there is a strong postive corrleation of 0.72 between critical temperature and the weighted standard deviation of Thermal Conductivity, indicating an increase in critical temperature we would have an increase in the weighted standard deviation of Thermal Conductivity. Of the elements featured in the study we see that Ba (Barium) & O (Oxygen) have a moderate correlation for critical temperature at 0.56 & 0.57 respectively. There is a moderate negative correlation of -0.62 between critical temperature and the weighted g-average of valence.

Examine the correlations among the features¶

In [29]:
correlations = merged_df_final.corr()

#set the correlation threshold
threshold = 0.90

#Create empty lists to store strong relationships 
strong_pos_corr = []
strong_neg_corr = []

# Iterate through the correlation matrix
for feature1 in correlations.columns:
    for feature2 in correlations.index:
        if feature1 !=feature2: # Avoid comparing a feature with itself
            corr_value = correlations.loc[feature2, feature1]
            if corr_value > threshold:
                strong_pos_corr.append((feature2, feature1, corr_value))
            elif corr_value < -threshold:
                strong_neg_corr.append((feature2, feature1, corr_value))

                
# Sort the correlations alphabetically
strong_pos_corr.sort()
strong_neg_corr.sort()
                
# Format results as table
pos_table = tabulate(strong_pos_corr, headers=["Feature 1", "Feature 2", "Correlation"], tablefmt="grid")
neg_table = tabulate(strong_neg_corr, headers=["Feature 1", "Feature 2", "Correlation"], tablefmt="grid")

#Display strong pos correlations
print("Strong Positive Correlations:")
print(pos_table)

print()

#Display strong pos correlations
print("Strong Negative Correlations:")
print(neg_table)
   
Strong Positive Correlations:
+-----------------------------+-----------------------------+---------------+
| Feature 1                   | Feature 2                   |   Correlation |
+=============================+=============================+===============+
| entropy_Density             | entropy_FusionHeat          |      0.917732 |
+-----------------------------+-----------------------------+---------------+
| entropy_Density             | entropy_Valence             |      0.900579 |
+-----------------------------+-----------------------------+---------------+
| entropy_Density             | entropy_atomic_mass         |      0.932668 |
+-----------------------------+-----------------------------+---------------+
| entropy_Density             | entropy_atomic_radius       |      0.91555  |
+-----------------------------+-----------------------------+---------------+
| entropy_Density             | entropy_fie                 |      0.902037 |
+-----------------------------+-----------------------------+---------------+
| entropy_ElectronAffinity    | entropy_Valence             |      0.904659 |
+-----------------------------+-----------------------------+---------------+
| entropy_ElectronAffinity    | entropy_atomic_radius       |      0.909744 |
+-----------------------------+-----------------------------+---------------+
| entropy_ElectronAffinity    | entropy_fie                 |      0.912862 |
+-----------------------------+-----------------------------+---------------+
| entropy_FusionHeat          | entropy_Density             |      0.917732 |
+-----------------------------+-----------------------------+---------------+
| entropy_FusionHeat          | entropy_Valence             |      0.921445 |
+-----------------------------+-----------------------------+---------------+
| entropy_FusionHeat          | entropy_atomic_mass         |      0.928251 |
+-----------------------------+-----------------------------+---------------+
| entropy_FusionHeat          | entropy_atomic_radius       |      0.930294 |
+-----------------------------+-----------------------------+---------------+
| entropy_FusionHeat          | entropy_fie                 |      0.916592 |
+-----------------------------+-----------------------------+---------------+
| entropy_FusionHeat          | number_of_elements          |      0.900759 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | entropy_Density             |      0.900579 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | entropy_ElectronAffinity    |      0.904659 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | entropy_FusionHeat          |      0.921445 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | entropy_atomic_mass         |      0.963621 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | entropy_atomic_radius       |      0.989546 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | entropy_fie                 |      0.992726 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | number_of_elements          |      0.967832 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | wtd_entropy_Valence         |      0.910822 |
+-----------------------------+-----------------------------+---------------+
| entropy_Valence             | wtd_entropy_atomic_radius   |      0.919184 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_mass         | entropy_Density             |      0.932668 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_mass         | entropy_FusionHeat          |      0.928251 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_mass         | entropy_Valence             |      0.963621 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_mass         | entropy_atomic_radius       |      0.972329 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_mass         | entropy_fie                 |      0.964695 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_mass         | number_of_elements          |      0.939304 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | entropy_Density             |      0.91555  |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | entropy_ElectronAffinity    |      0.909744 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | entropy_FusionHeat          |      0.930294 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | entropy_Valence             |      0.989546 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | entropy_atomic_mass         |      0.972329 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | entropy_fie                 |      0.997739 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | number_of_elements          |      0.972245 |
+-----------------------------+-----------------------------+---------------+
| entropy_atomic_radius       | wtd_entropy_atomic_radius   |      0.914223 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | entropy_Density             |      0.902037 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | entropy_ElectronAffinity    |      0.912862 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | entropy_FusionHeat          |      0.916592 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | entropy_Valence             |      0.992726 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | entropy_atomic_mass         |      0.964695 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | entropy_atomic_radius       |      0.997739 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | number_of_elements          |      0.973195 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | wtd_entropy_Valence         |      0.907923 |
+-----------------------------+-----------------------------+---------------+
| entropy_fie                 | wtd_entropy_atomic_radius   |      0.920192 |
+-----------------------------+-----------------------------+---------------+
| gmean_Density               | wtd_gmean_Density           |      0.951995 |
+-----------------------------+-----------------------------+---------------+
| gmean_FusionHeat            | mean_FusionHeat             |      0.926769 |
+-----------------------------+-----------------------------+---------------+
| gmean_Valence               | mean_Valence                |      0.989911 |
+-----------------------------+-----------------------------+---------------+
| gmean_Valence               | wtd_gmean_Valence           |      0.933036 |
+-----------------------------+-----------------------------+---------------+
| gmean_Valence               | wtd_mean_Valence            |      0.917905 |
+-----------------------------+-----------------------------+---------------+
| gmean_atomic_mass           | mean_atomic_mass            |      0.940298 |
+-----------------------------+-----------------------------+---------------+
| gmean_atomic_radius         | mean_atomic_radius          |      0.915931 |
+-----------------------------+-----------------------------+---------------+
| gmean_fie                   | mean_fie                    |      0.969325 |
+-----------------------------+-----------------------------+---------------+
| mean_FusionHeat             | gmean_FusionHeat            |      0.926769 |
+-----------------------------+-----------------------------+---------------+
| mean_FusionHeat             | wtd_mean_FusionHeat         |      0.909575 |
+-----------------------------+-----------------------------+---------------+
| mean_Valence                | gmean_Valence               |      0.989911 |
+-----------------------------+-----------------------------+---------------+
| mean_Valence                | wtd_gmean_Valence           |      0.940001 |
+-----------------------------+-----------------------------+---------------+
| mean_Valence                | wtd_mean_Valence            |      0.937103 |
+-----------------------------+-----------------------------+---------------+
| mean_atomic_mass            | gmean_atomic_mass           |      0.940298 |
+-----------------------------+-----------------------------+---------------+
| mean_atomic_radius          | gmean_atomic_radius         |      0.915931 |
+-----------------------------+-----------------------------+---------------+
| mean_fie                    | gmean_fie                   |      0.969325 |
+-----------------------------+-----------------------------+---------------+
| number_of_elements          | entropy_FusionHeat          |      0.900759 |
+-----------------------------+-----------------------------+---------------+
| number_of_elements          | entropy_Valence             |      0.967832 |
+-----------------------------+-----------------------------+---------------+
| number_of_elements          | entropy_atomic_mass         |      0.939304 |
+-----------------------------+-----------------------------+---------------+
| number_of_elements          | entropy_atomic_radius       |      0.972245 |
+-----------------------------+-----------------------------+---------------+
| number_of_elements          | entropy_fie                 |      0.973195 |
+-----------------------------+-----------------------------+---------------+
| number_of_elements          | wtd_entropy_atomic_radius   |      0.904121 |
+-----------------------------+-----------------------------+---------------+
| range_Density               | std_Density                 |      0.959956 |
+-----------------------------+-----------------------------+---------------+
| range_Density               | wtd_std_Density             |      0.907307 |
+-----------------------------+-----------------------------+---------------+
| range_ElectronAffinity      | std_ElectronAffinity        |      0.973114 |
+-----------------------------+-----------------------------+---------------+
| range_FusionHeat            | std_FusionHeat              |      0.984574 |
+-----------------------------+-----------------------------+---------------+
| range_FusionHeat            | wtd_std_FusionHeat          |      0.925642 |
+-----------------------------+-----------------------------+---------------+
| range_ThermalConductivity   | std_ThermalConductivity     |      0.987867 |
+-----------------------------+-----------------------------+---------------+
| range_ThermalConductivity   | wtd_std_ThermalConductivity |      0.965449 |
+-----------------------------+-----------------------------+---------------+
| range_Valence               | std_Valence                 |      0.973788 |
+-----------------------------+-----------------------------+---------------+
| range_atomic_mass           | std_atomic_mass             |      0.960854 |
+-----------------------------+-----------------------------+---------------+
| range_atomic_mass           | wtd_std_atomic_mass         |      0.918152 |
+-----------------------------+-----------------------------+---------------+
| range_atomic_radius         | range_fie                   |      0.908734 |
+-----------------------------+-----------------------------+---------------+
| range_atomic_radius         | std_atomic_radius           |      0.967428 |
+-----------------------------+-----------------------------+---------------+
| range_atomic_radius         | wtd_std_atomic_radius       |      0.958004 |
+-----------------------------+-----------------------------+---------------+
| range_fie                   | range_atomic_radius         |      0.908734 |
+-----------------------------+-----------------------------+---------------+
| range_fie                   | std_fie                     |      0.981628 |
+-----------------------------+-----------------------------+---------------+
| range_fie                   | wtd_std_fie                 |      0.940281 |
+-----------------------------+-----------------------------+---------------+
| std_Density                 | range_Density               |      0.959956 |
+-----------------------------+-----------------------------+---------------+
| std_Density                 | wtd_std_Density             |      0.905669 |
+-----------------------------+-----------------------------+---------------+
| std_ElectronAffinity        | range_ElectronAffinity      |      0.973114 |
+-----------------------------+-----------------------------+---------------+
| std_FusionHeat              | range_FusionHeat            |      0.984574 |
+-----------------------------+-----------------------------+---------------+
| std_FusionHeat              | wtd_std_FusionHeat          |      0.940183 |
+-----------------------------+-----------------------------+---------------+
| std_ThermalConductivity     | range_ThermalConductivity   |      0.987867 |
+-----------------------------+-----------------------------+---------------+
| std_ThermalConductivity     | wtd_std_ThermalConductivity |      0.955627 |
+-----------------------------+-----------------------------+---------------+
| std_Valence                 | range_Valence               |      0.973788 |
+-----------------------------+-----------------------------+---------------+
| std_atomic_mass             | range_atomic_mass           |      0.960854 |
+-----------------------------+-----------------------------+---------------+
| std_atomic_mass             | wtd_std_atomic_mass         |      0.919788 |
+-----------------------------+-----------------------------+---------------+
| std_atomic_radius           | range_atomic_radius         |      0.967428 |
+-----------------------------+-----------------------------+---------------+
| std_atomic_radius           | wtd_std_atomic_radius       |      0.944536 |
+-----------------------------+-----------------------------+---------------+
| std_fie                     | range_fie                   |      0.981628 |
+-----------------------------+-----------------------------+---------------+
| std_fie                     | wtd_std_fie                 |      0.934255 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_FusionHeat      | wtd_entropy_Valence         |      0.908728 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_FusionHeat      | wtd_entropy_atomic_radius   |      0.90786  |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_Valence         | entropy_Valence             |      0.910822 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_Valence         | entropy_fie                 |      0.907923 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_Valence         | wtd_entropy_FusionHeat      |      0.908728 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_Valence         | wtd_entropy_atomic_mass     |      0.918284 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_Valence         | wtd_entropy_atomic_radius   |      0.951463 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_mass     | wtd_entropy_Valence         |      0.918284 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_mass     | wtd_entropy_atomic_radius   |      0.961464 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | entropy_Valence             |      0.919184 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | entropy_atomic_radius       |      0.914223 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | entropy_fie                 |      0.920192 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | number_of_elements          |      0.904121 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | wtd_entropy_FusionHeat      |      0.90786  |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | wtd_entropy_Valence         |      0.951463 |
+-----------------------------+-----------------------------+---------------+
| wtd_entropy_atomic_radius   | wtd_entropy_atomic_mass     |      0.961464 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_Density           | gmean_Density               |      0.951995 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_Density           | wtd_mean_Density            |      0.941502 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_FusionHeat        | wtd_mean_FusionHeat         |      0.970948 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_Valence           | gmean_Valence               |      0.933036 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_Valence           | mean_Valence                |      0.940001 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_Valence           | wtd_mean_Valence            |      0.994939 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_atomic_mass       | wtd_mean_atomic_mass        |      0.964085 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_atomic_radius     | wtd_mean_atomic_radius      |      0.980107 |
+-----------------------------+-----------------------------+---------------+
| wtd_gmean_fie               | wtd_mean_fie                |      0.992331 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_Density            | wtd_gmean_Density           |      0.941502 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_FusionHeat         | mean_FusionHeat             |      0.909575 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_FusionHeat         | wtd_gmean_FusionHeat        |      0.970948 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_Valence            | gmean_Valence               |      0.917905 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_Valence            | mean_Valence                |      0.937103 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_Valence            | wtd_gmean_Valence           |      0.994939 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_atomic_mass        | wtd_gmean_atomic_mass       |      0.964085 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_atomic_radius      | wtd_gmean_atomic_radius     |      0.980107 |
+-----------------------------+-----------------------------+---------------+
| wtd_mean_fie                | wtd_gmean_fie               |      0.992331 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_Density             | range_Density               |      0.907307 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_Density             | std_Density                 |      0.905669 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_FusionHeat          | range_FusionHeat            |      0.925642 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_FusionHeat          | std_FusionHeat              |      0.940183 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_ThermalConductivity | range_ThermalConductivity   |      0.965449 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_ThermalConductivity | std_ThermalConductivity     |      0.955627 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_atomic_mass         | range_atomic_mass           |      0.918152 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_atomic_mass         | std_atomic_mass             |      0.919788 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_atomic_radius       | range_atomic_radius         |      0.958004 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_atomic_radius       | std_atomic_radius           |      0.944536 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_atomic_radius       | wtd_std_fie                 |      0.922258 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_fie                 | range_fie                   |      0.940281 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_fie                 | std_fie                     |      0.934255 |
+-----------------------------+-----------------------------+---------------+
| wtd_std_fie                 | wtd_std_atomic_radius       |      0.922258 |
+-----------------------------+-----------------------------+---------------+

Strong Negative Correlations:
+-------------------------+-------------------------+---------------+
| Feature 1               | Feature 2               |   Correlation |
+=========================+=========================+===============+
| wtd_gmean_atomic_radius | wtd_mean_fie            |     -0.914255 |
+-------------------------+-------------------------+---------------+
| wtd_mean_fie            | wtd_gmean_atomic_radius |     -0.914255 |
+-------------------------+-------------------------+---------------+

From the correlation table, we can see that the entropy of all of the features contained within the model are extremely positively correlated with all of the other entropy related features. Of the entropy's the strongest correlation is between the entropy of Valence and the entropy of FIE at 0.993 The only two very strong negative correlation that we observe within the study are between the weighted geometric mean of atomic radius and the weighted average of FIE. It makes logicial sense that the features of a particular subgroup are interrelated with a strong correlation to one another.

Modeling¶

In [30]:
#split the datasets into x and y
data_x = merged_df_final.drop(columns='critical_temp_x')
data_y = merged_df_final[['critical_temp_x']]

#normilize the dataset
scaler = MinMaxScaler()

norm_y = scaler.fit_transform(data_y)

#split the data into trian and test datasets
x_train, x_test, y_train, y_test = train_test_split(data_x, norm_y, test_size=0.2, random_state=10)

#train the model
lasso_model = Lasso(alpha=0.01, max_iter=100000)
lasso_model.fit(x_train,y_train)

#test the model
y_pred = lasso_model.predict(x_test)

#find the mse of the test
#mse = mean_squared_error(y_test, y_pred)
#print(f"The MSE is {mse}")


y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1))  # Inverse transform normalized y_test to original scale
y_pred_actual = scaler.inverse_transform(y_pred.reshape(-1, 1))

#find the coefficients and intercept
mse = mean_squared_error(y_test_actual, y_pred_actual)
print(f"The LASSO MSE is {mse}")

# Create a DataFrame for plotting
plot_data = pd.DataFrame({'Actual': y_test_actual.flatten(), 'Predicted': y_pred_actual.flatten()})

# Create a scatter plot
fig = px.scatter(plot_data, x='Actual', y='Predicted', title='Lasso Model Performance')

# Customize the plot
fig.update_layout(
    xaxis_title='Actual Critical Temperature',
    yaxis_title='Predicted Critical Temperature',
    showlegend=True,
    legend_title='Data Points',
    width=600,
    height=500
)

# Show the plot
fig.show()
#336
The LASSO MSE is 336.8236368162868
In [31]:
# Get the coefficients of the Lasso model
lasso_coeffs = lasso_model.coef_

# Create a DataFrame to display the coefficients along with the corresponding feature names
coefficients_df = pd.DataFrame({'Feature': data_x.columns, 'Coefficient': lasso_coeffs})

# Sort the coefficients by absolute value to visualize importance
coefficients_df['Abs_Coefficient'] = abs(coefficients_df['Coefficient'])
coefficients_df = coefficients_df.sort_values(by='Abs_Coefficient', ascending=False)

# Display the top N important features (e.g., top 10)
top_n = 10
top_features = coefficients_df.head(top_n)

# Print the top 10 important features
print(f"Top {top_n} Important Features:")
print(top_features)

# Create a bar plot to visualize the top N important features
import plotly.graph_objects as go

fig_coeffs = go.Figure()
fig_coeffs.add_trace(go.Bar(x=top_features['Feature'], y=top_features['Abs_Coefficient']))

fig_coeffs.update_layout(
    title=f'Top {top_n} Important Features - Lasso Model',
    xaxis_title='Feature',
    yaxis_title='Absolute Coefficient',
    width=800,
    height=400
)

# Show the bar plot for coefficients
fig_coeffs.show()
Top 10 Important Features:
                           Feature  Coefficient  Abs_Coefficient
136                             Ba     0.014870         0.014870
49            std_ElectronAffinity     0.004856         0.004856
44      wtd_gmean_ElectronAffinity    -0.002778         0.002778
62    wtd_mean_ThermalConductivity     0.002662         0.002662
88                               O     0.002382         0.002382
42       wtd_mean_ElectronAffinity     0.002311         0.002311
50        wtd_std_ElectronAffinity    -0.001944         0.001944
64   wtd_gmean_ThermalConductivity    -0.001941         0.001941
47          range_ElectronAffinity    -0.001800         0.001800
10             wtd_std_atomic_mass    -0.001728         0.001728
In [32]:
# Train the Ridge model
ridge_model = Ridge(alpha=0.01)  # You can adjust the alpha parameter
ridge_model.fit(x_train, y_train)

# Test the Ridge model
y_pred_ridge = ridge_model.predict(x_test)

# Inverse transform the normalized Ridge predictions
y_pred_ridge_actual = scaler.inverse_transform(y_pred_ridge.reshape(-1, 1))

# Find the MSE of the test
mse_ridge = mean_squared_error(y_test_actual, y_pred_ridge_actual)
print(f"The Ridge MSE is {mse_ridge}")

# Create a DataFrame for plotting
plot_data_ridge = pd.DataFrame({'Actual': y_test_actual.flatten(), 'Predicted': y_pred_ridge_actual.flatten()})

# Create a scatter plot for Ridge model performance
fig_ridge = px.scatter(plot_data_ridge, x='Actual', y='Predicted', title='Ridge Model Performance')

# Customize the Ridge plot
fig_ridge.update_layout(
    xaxis_title='Actual Critical Temperature',
    yaxis_title='Predicted Critical Temperature',
    showlegend=True,
    legend_title='Data Points',
    width=600,
    height=500
)

# Show the Ridge plot
fig_ridge.show()
The Ridge MSE is 289.35813124754634
In [33]:
# Train the Ridge model
ridge_model = Ridge(alpha=0.01)  # You can adjust the alpha parameter
ridge_model.fit(x_train, y_train)

# Get the coefficients of the Ridge model
ridge_coeffs = ridge_model.coef_[0]  # Take the first element to get the coefficients

# Create a Series to display the coefficients along with the corresponding feature names
coefficients_ridge_series = pd.Series(ridge_coeffs, index=data_x.columns)

# Sort the coefficients by absolute value to visualize importance
coefficients_ridge_series_abs = coefficients_ridge_series.abs().sort_values(ascending=False)

# Display the top N important features (e.g., top 10)
top_n_ridge = 10
top_features_ridge = coefficients_ridge_series_abs.head(top_n_ridge)

# Print the top N important features for Ridge
print(f"Top {top_n_ridge} Important Features (Ridge):")
print(top_features_ridge)

# Create a bar plot to visualize the top N important features for Ridge
import plotly.graph_objects as go

fig_coeffs_ridge = go.Figure()
fig_coeffs_ridge.add_trace(go.Bar(x=top_features_ridge.index, y=top_features_ridge.values))

fig_coeffs_ridge.update_layout(
    title=f'Top {top_n_ridge} Important Features - Ridge Model',
    xaxis_title='Feature',
    yaxis_title='Absolute Coefficient',
    width=800,
    height=400
)

# Show the bar plot for coefficients for Ridge
fig_coeffs_ridge.show()
Top 10 Important Features (Ridge):
entropy_Valence                0.373793
wtd_entropy_Valence            0.341844
wtd_entropy_fie                0.267281
wtd_entropy_FusionHeat         0.138216
entropy_fie                    0.119500
entropy_FusionHeat             0.114483
entropy_atomic_mass            0.114102
wtd_entropy_ElectronAffinity   0.106014
entropy_atomic_radius          0.105997
wtd_std_Valence                0.093391
dtype: float64